O F Nucleotide Substitutions from Restriction Sites Data

نویسنده

  • F. TAJIMA
چکیده

A simple method of the maximum likelihood estimation of the number of nucleotide substitutions is presented for the case where restriction sites data from many different restriction enzymes are available. An iteration method, based on nucleotide counting, is also developed. This method is simpler than the maximum likelihood method but gives the same estimate. A formula for computing the variance of a maximum likelihood estimate is also presented. HE number of nucleotide substitutions between a pair of homologous T DNAs can be estimated from data on restriction enzyme cleavage sites (UPHOLT 1977; NEI and LI 1979; KAPLAN and LANGLEY 1979; GOTOH et al. 1979). When all the restriction enzymes used have the same number of nucleotides in their recognition sequence, the number of nucleotide differences per site can be estimated by a simple formula. However, when restriction enzymes with different numbers of recognition nucleotides are used, a rather complicated procedure of maximum likelihood estimation is used (KAPLAN and LANGLEY 1979; KAPLAN and RISKO 1981). GOTOH et al. (1979) used a relatively simple maximum likelihood method, but their formulation does not seem to be accurate. In the following we would like to present a simple method of maximum likelihood estimation. We shall also present a method for estimating the number of nucleotide substitutions by means of nucleotide counting and show that this method gives the maximum likelihood estimate. MAXIMUM LIKELIHOOD METHOD We assume that the four types of nucleotides (T, C, A, G) are randomly arranged in the DNA sequence under investigation, and the evolutionary change of DNA sequence occurs solely by random nucleotide substitution. (See DISCUSSION for the effect of violation of this assumption.) In this case the expected number of restriction sites for a restriction enzyme with a recognition sequence of Y nucleotides (usually r = 4 or 6) is given by m N , where is the total number of nucleotides and a is the probability that a sequence of r nucleotides in the DNA is a restriction site. In general, a = g;'g?g$3g24, where Genetics 105 207-217 September, 1983 208 M. NE1 AND F. TAJIMA gl, gL, gl and g4 are the frequencies of nucleotides T, C, A and G, respectively, in the 5'-3' strand of DNA, and r l , r2, r3 and r4 are the numbers of T, C, A and G in the recognition sequence, respectively (Zr, = r ) . For example, the recognition sequence of EraRI is GAATTC, so that rl = 2, ~2 = 1, r3 = 2 and rq = 1. When a restriction enzyme identifies more than one type of recognition sequences (e .g . , HcreI), a is given by a somewhat different formula as will be discussed later. Usually, a is much smaller than 1. We consider two DNA sequences (X and Y ) that diverged t years (or generations) ago and compare all possible restriction sites of the two sequences. We note that, in a circular DNA of m T nucleotides, there are mT possible restriction sites (NEI and LI 1979). In a linear DNA the possible number of restriction sites is vzT Y + 1. However, 7nT is usually much larger than r , $0 that the possible number is again approximately inT. In the comparison of restriction sites between two DNA sequences, there are four different cases. A sequence of r nucleotides at a particular location of the DNA can be a restriction site (1) for both X and Y, (2) for X but not for Y , (3) for. Y but not for X and (4) for neither of X and Y. Let mx and 7ny be the numbers of restriction sites for DNA sequences X and Y, respectively, and inxy be the number of restriction sites shared by X and Y. The numbers of observations for these four events are then given by 7)zxy, m x mxy, m y rnxy and VZT inx m y + mxy, respectively. Let us now derive the probabilities of these four events, considering restriction enzymes with a unique recognition sequence. Let U', be the probability that a sequence of r nucleotides at a location of the DNA i s different from the recognition sequence by z nucleotides, p be the probability that a restriction site in a DNA sequence disappears during t years and q1 be the probability that a site (a sequence of r nucleotides) which was originally different from the recognition sequence by t nucleotides becomes a restriction site during t years. Since the expected number of restriction sites remains constant over time, we have the relationship up = ZLlw,q,. The probability that a sequence of r nucleotides in the DNA is a restriction site for both X and Y is then given by a(1 p)' + Z,zu,q; = a[(l p)* + Z,zLl,q:/a 1. This may be written as OS, where S = (1 p)' + C.,zcl,q%/a. The probability that the same nucleotide sequence is a restriction site for X but not for Y is ap(1 p ) + Z,u?,q,(l q,) = a [ l ((1 p)' + Z,zcl,q;/a)] = n(l S). The probability of the third event is the same as that for the second event. The probability that the sequence is not a restriction site for both X and Y is up2 + &u,(1 qi)* = 1 a [ 2 (1 p ) 2 Z,ic~,q~/a] = 1 c r (2 S) since Z;ZP, = 1 a. In these probabilities S may be written as (1 T)~, where t is the probability that sequences X and Y have different nucleotides at a given nucleotide position. This H is related to the expected number of nucleotide substitutions per site (6) by

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evolutionary change of restriction cleavage sites and phylogenetic inference for man and apes.

A mathematical theory for the evolutionary change of restriction endonuclease cleavage sites is developed, and the probabilities of various types of restriction-site changes are evaluated. A computer simulation is also conducted to study properties of the evolutionary change of restriction sites. These studies indicate that parsimony methods of constructing phylogenetic trees often make erroneo...

متن کامل

Human mitochondrial DNA variation and evolution: analysis of nucleotide sequences from seven individuals.

We have analyzed nucleotide sequence variation in an approximately 900-base pair region of the human mitochondrial DNA molecule encompassing the heavy strand origin of replication and the D-loop. Our analysis has focused on nucleotide sequences available from seven humans. Average nucleotide diversity among the sequences is 1.7%, several-fold higher than estimates from restriction endonuclease ...

متن کامل

Nonneutral mitochondrial DNA variation in humans and chimpanzees.

We sequenced the NADH dehydrogenase subunit 3 (ND3) gene from a sample of 61 humans, five common chimpanzees, and one gorilla to test whether patterns of mitochondrial DNA (mtDNA) variation are consistent with a neutral model of molecular evolution. Within humans and within chimpanzees, the ratio of replacement to silent nucleotide substitutions was higher than observed in comparisons between s...

متن کامل

Evolutionary Change of Restriction Cleavage Sites and Phylogenetic Inference for Man and Apes1

A mathematical theory for the evolutionary change of restriction endonuclease cleavage sites is developed, and the probabilities of various types of restrictionsite changes are evaluated. A computer simulation is also conducted to study properties of the evolutionary change of restriction sites. These studies indicate that parsimony methods of constructing phylogenetic trees often make erroneou...

متن کامل

Enzymatic Cleavage of Type II Restriction Endonucleases on the 2′-O-Methyl Nucleotide and Phosphorothioate Substituted DNA

The effects of nucleotide analogue substitution on the cleavage efficiencies of type II restriction endonucleases have been investigated. Six restriction endonucleases (EcoRV, SpeI, XbaI, XhoI, PstI and SphI) were investigated respectively regarding their cleavage when substrates were substituted by 2'-O-methyl nucleotide (2'-OMeN) and phosphorothioate (PS). Substitutions were made in the recog...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003